---
title:  Data Structure Basics pt.4
description: >
  Notes on associative arrays.
categories: posts
tags: [data-structures, ruby]
---

<details markdown="1" id="table-of-contents">
<summary>
Table of Contents
</summary>

* TOC
{:toc}
</details>

## Review

So far, in all previous data structures we've either use a numeric index or no
index at all, like in stacks and queues. The index ones usually are zero-based
and are good at accessing it's own objects.
Those data structures are great for keeping data in order. The downside is that
we need to know exactly where in the collection is the object we need. Which
forces us to search for it. Unfortunately these data structures aren't the best
for searching for items. To learn the best way to search those data structures
we need to talk about efficient search algorithms and that's beyond the scope of
this series of articles.

For now let's talk about data structures that are better designed for searching.

## Associative Arrays

Associative arrays give us the ability to use meaningful keys to access each
value inside the collection. This effectively creates a meaningful relationship
between those two making them a pair. That means that no matter how we sort the
pairs inside a collection keys will always return the same values. Actually,
since each key must be unique, there is no need to sort them. Values can be
almost anything and it's not required to be unique.

### Hash Values

A basic understanding of how hashing works is necessary to understand hash tables.

We can think of a hashing as in cooking. First we chop our ingredients. Then we mix
them together until we get a homogeneous dough.
Similarly, in order to produce the hash value for an object we take some of its
parts and return a simplified integer representation of the whole object. This is
very useful in data structures because we can easily find data therein.

Hashing is not encryption. Hashing functions are typically one-way since
information gets lost in the process.

Some hashing rules:

- Hashing should be deterministic under the same context; should always
return the same hash value for a given object.
- Two objects that are __equal__ should return the same hash.
- Two different objects __may__ return the same hash.

#### Hashing Collision

Although most languages know how to deal with hashing collision we should be
aware of its implications.
Imagine we have a hashing function that turns every letter in a given object into
a number based on their place in the alphabet. Numbers stay the same, all other
characters get discarded. Finally, it returns their sum.
Given the object `Sam Jones 04/04/1990` our function would compute it like
`19 + 1 + 13 + 10 + 15 + 14 + 5 + 4 + 4 + 1990` and return `2075`.
If we were to pass `Fay Adams 10/10/1985` to our function it would return the
same number.

So a hash collision happens when we have at least two objects with different data
but the same hash value.

#### Hash Values in Ruby

All instances respond to `#hash` which returns their hash value. However, when
writing our own classes we should override it with our own implementation as to
guarantee that the `==` behaves as expected specially if we override it.


### Hash Tables

The main advantage of hash tables over arrays and linked lists is the speed in
which they can find an object. They are also good at adding and deleting object.

Hash tables can be either fixed or flexible. In either case, hash tables are
usually initialized with buckets waiting for content. A 'bucket' is how we call
entries in a hash table.

Hash tables, behind the scenes, behave like arrays. When we add a new bucket to a
hash table, the key gets passed through a hashing function. Usually the hash
value gets reduced, and the number returned is the index of the array behind the
scenes. A similar thing happens when we search for an item, and like we saw back
in [arrays]({{site.url}}/posts/data-structure-basics-2), as long as we know the index of the item we are looking
for the size and flexibility of an array are irrelevant.

#### Managing Collisions

No matter how good a hashing function is, its bound to create a single hash value
for more than one object. Hash tables implement ways to deal with it. For instance,
the Separate Chaining Technique; whenever there's a collision the bucket's value turns
into a linked list or an array. Other algorithms include: open addressing, Robin
Hood hashing, and 2-choice hashing. But they are beyond the scope of this series.

### Associative Arrays in Ruby

Hash tables are implemented in Ruby via the `Hash` class. Do not confuse it the
`#hash` method mentioned above.

## Sets

This is probably the simplest data structure to use, its usually:
- An unordered collection of objects.
- No index, sequence or key is set to access an object.
- No duplicates allowed.
- Fast lookup - for checking existence, not retrieval.

Behind the scenes, sets behave similarly to hash tables. The main difference is
that the object itself gets passed through the hashing function. The hash value
is the index of the bucket where it will be stored. That's also the reason why
there can't be duplicates and why they are really fast at finding.

The reason why sets are not good at retrieving objects. If we think it through,
it makes no sense to try to retrieve the very same object we are using to
find it. Instead a set is really fast at letting us know whether it contains an
element or not.

### Sets in Ruby

Ruby has a built-in implementation of sets. Is a hybrid of Array's intuitive
inter-operation facilities and Hash's fast lookup. For further details check the
docs.

### Before we go

This time we covered the basics on associative arrays. In the next article in
this series we will talk about tree data structures.
